{
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "<i>Copyright (c) Microsoft Corporation.</i>\n",
    "\n",
    "<i>Licensed under the MIT License.</i> "
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "# Automated Machine Learning (AutoML) on Azure for Retail Sales Forecasting\n",
    "\n",
    "This notebook demonstrates how to apply [AutoML in Azure Machine Learning services](https://docs.microsoft.com/en-us/azure/machine-learning/concept-automated-ml) to train and tune machine learning models for forecasting product sales in retail. We will use the Orange Juice dataset to illustrate the steps of utilizing AutoML as well as how to combine an AutoML model with a custom model for better performance.\n",
    "\n",
    "AutoML is a process of automating the tasks of machine learning model development. It helps data scientists and other practitioners build machine learning models with high scalability and quality in less amount of time. AutoML in Azure Machine Learning allows you to train and tune a model using a target metric that you specify. This service iterates through machine learning algorithms and feature selection approaches, producing a score that measures the quality of each machine learning pipeline. The best model will then be selected based on the scores. For more technical details about Azure AutoML, please check [this paper](https://papers.nips.cc/paper/7595-probabilistic-matrix-factorization-for-automated-machine-learning.pdf).\n",
    "\n",
    "This notebook uses [Azure ML SDK](https://docs.microsoft.com/en-us/python/api/overview/azureml-sdk/?view=azure-ml-py) which is included in the `forecasting_env` conda environment. If you are running in Azure Notebooks or another Microsoft managed environment, the SDK is already installed. On the other hand, if you are running this notebook in your own environment, please follow [SDK installation instructions](https://docs.microsoft.com/azure/machine-learning/service/how-to-configure-environment) to install the SDK."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Global Settings and Imports"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 1,
   "metadata": {},
   "outputs": [],
   "source": [
    "%load_ext autoreload\n",
    "%autoreload 2\n",
    "%matplotlib inline"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 2,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "System version: 3.6.10 |Anaconda, Inc.| (default, Jan  7 2020, 21:14:29) \n",
      "[GCC 7.3.0]\n",
      "This notebook was created using version 1.0.85 of the Azure ML SDK\n",
      "You are currently using version 1.0.85 of the Azure ML SDK\n"
     ]
    }
   ],
   "source": [
    "import os\n",
    "import sys\n",
    "import math\n",
    "import warnings\n",
    "import datetime\n",
    "import logging\n",
    "import azureml.core\n",
    "import azureml.automl\n",
    "import pandas as pd\n",
    "\n",
    "from matplotlib import pyplot as plt\n",
    "from fclib.common.utils import git_repo_path\n",
    "from fclib.azureml.azureml_utils import (\n",
    "    get_or_create_workspace,\n",
    "    get_or_create_amlcompute,\n",
    ")\n",
    "from fclib.dataset.ojdata import download_ojdata, FIRST_WEEK_START\n",
    "from fclib.common.utils import align_outputs\n",
    "from fclib.evaluation.evaluation_utils import MAPE\n",
    "from fclib.models.multiple_linear_regression import fit, predict\n",
    "\n",
    "from azureml.core import Workspace\n",
    "from azureml.core.dataset import Dataset\n",
    "from azureml.core.experiment import Experiment\n",
    "from automl.client.core.common import constants\n",
    "from azureml.train.automl import AutoMLConfig\n",
    "from azureml.core.compute import ComputeTarget, AmlCompute\n",
    "from azureml.automl.core._vendor.automl.client.core.common import metrics\n",
    "\n",
    "warnings.filterwarnings(\"ignore\")\n",
    "\n",
    "print(\"System version: {}\".format(sys.version))\n",
    "print(\"This notebook was created using version 1.0.85 of the Azure ML SDK\")\n",
    "print(\"You are currently using version\", azureml.core.VERSION, \"of the Azure ML SDK\")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 3,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Use False if you've already downloaded and split the data\n",
    "DOWNLOAD_SPLIT_DATA = True\n",
    "\n",
    "# Data directory\n",
    "DATA_DIR = os.path.join(git_repo_path(), \"ojdata\")\n",
    "\n",
    "# Forecasting settings\n",
    "GAP = 2\n",
    "LAST_WEEK = 138\n",
    "\n",
    "# Number of test periods\n",
    "NUM_TEST_PERIODS = 3\n",
    "\n",
    "# Column names\n",
    "time_column_name = \"week_start\"\n",
    "target_column_name = \"move\"\n",
    "grain_column_names = [\"store\", \"brand\"]\n",
    "index_column_names = [time_column_name] + grain_column_names\n",
    "\n",
    "# Subset of stores used in the notebook\n",
    "USE_STORES = [2, 5, 8]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Set up Azure Machine Learning Workspace\n",
    "\n",
    "An Azure ML workspace is an Azure resource that organizes and coordinates the actions of many other Azure resources to assist in executing and sharing machine learning workflows. In particular, an Azure ML workspace coordinates storage, databases, and compute resources providing added functionality for machine learning experimentation, deployment, inference, and the monitoring of deployed models. To create an Azure ML workspace, first you need access to an Azure subscription. An Azure subscription allows you to manage storage, compute, and other assets in the Azure cloud. You can [create a new subscription](https://azure.microsoft.com/en-us/free/) or access existing subscription information from the [Azure portal](https://portal.azure.com/). Given that you have access to your Azure subscription, you can further create an Azure ML workspace by following the instructions [here](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace). You can also do so [using Azure CLI](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-manage-workspace-cli) or the `Workspace.create()` method in Azure SDK.\n",
    "\n",
    "Once you have created an Azure ML workspace, you can download its configuration file (`config.json`) from Azure Portal as follows\n",
    "\n",
    "<img src=\"https://user-images.githubusercontent.com/20047467/76651752-8827b180-653b-11ea-942d-99cf0bdc4f96.png\" width=\"900\" height=\"320\">"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Prepare Azure ML Workspace\n",
    "\n",
    "In the following cell, `get_or_create_workspace()` creates a workspace object from the details stored in `config.json` that you have downloaded. We assume that you store this config file to a directory `./.azureml`. In case the existing workspace cannot be loaded, the following cell will try to create a new workspace with the subscription ID, resource group, and workspace name as specified in the beginning of the cell.\n",
    "\n",
    "The cell can fail if you don't have permission to access the workspace. You may need to log into your Azure account and change the default subscription to the one which the workspace belongs to using Azure CLI `az account set --subscription <name or id>`."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 6,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Workspace preparation succeeded.\n"
     ]
    }
   ],
   "source": [
    "# Please specify the AzureML workspace attributes below if you want to create a new one.\n",
    "subscription_id = \"<subscription-id>\"\n",
    "resource_group = \"<resource-group>\"\n",
    "workspace_name = \"<workspace-name>\"\n",
    "workspace_region = \"<workspace-region>\"\n",
    "\n",
    "# Connect to a workspace\n",
    "ws = get_or_create_workspace(\n",
    "    config_path=\"./.azureml\",\n",
    "    subscription_id=subscription_id,\n",
    "    resource_group=resource_group,\n",
    "    workspace_name=workspace_name,\n",
    "    workspace_region=workspace_region,\n",
    ")\n",
    "print(\n",
    "    \"Workspace name: \" + ws.name,\n",
    "    \"Azure region: \" + ws.location,\n",
    "    \"Resource group: \" + ws.resource_group,\n",
    "    sep=\"\\n\",\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create compute resources for your experiments\n",
    "\n",
    "We run AutoML on a dynamically scalable compute cluster. In the next cell, we create an AmlCompute target with a specific cluster name, VM size, and maximum number of nodes if the cluster does not exist. Otherwise, we will reuse an existing one. For more options of VM sizes, please check the information in this [link](https://docs.microsoft.com/en-us/azure/virtual-machines/sizes-general)."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 7,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Found existing cpu-cluster\n"
     ]
    }
   ],
   "source": [
    "# Choose a name for your cluster\n",
    "cluster_name = \"cpu-cluster\"\n",
    "# VM Size\n",
    "vm_size = \"STANDARD_D2_V2\"\n",
    "# Maximum number of nodes of the cluster\n",
    "max_nodes = 4\n",
    "\n",
    "# Create a new AmlCompute if it does not exist or reuse an existing one\n",
    "cpu_cluster = get_or_create_amlcompute(\n",
    "    workspace=ws,\n",
    "    compute_name=cluster_name,\n",
    "    vm_size=vm_size,\n",
    "    min_nodes=0,\n",
    "    max_nodes=max_nodes,\n",
    "    verbose=True,\n",
    ")"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Define Experiment\n",
    "\n",
    "To run AutoML, you need to create an Experiment. An Experiment corresponds to a prediction problem you are trying to solve, while a Run corresponds to a specific approach to the problem."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 8,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th></th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>SDK version</th>\n",
       "      <td>1.0.85</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Workspace</th>\n",
       "      <td>chhamlws</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>SKU</th>\n",
       "      <td>Basic</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Resource Group</th>\n",
       "      <td>chhamlwsrg</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Location</th>\n",
       "      <td>westcentralus</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>Run History Name</th>\n",
       "      <td>automl-ojforecasting</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "                                      \n",
       "SDK version       1.0.85              \n",
       "Workspace         chhamlws            \n",
       "SKU               Basic               \n",
       "Resource Group    chhamlwsrg          \n",
       "Location          westcentralus       \n",
       "Run History Name  automl-ojforecasting"
      ]
     },
     "execution_count": 8,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# choose a name for the run history container in the workspace\n",
    "experiment_name = \"automl-ojforecasting\"\n",
    "\n",
    "experiment = Experiment(ws, experiment_name)\n",
    "\n",
    "output = {}\n",
    "output[\"SDK version\"] = azureml.core.VERSION\n",
    "output[\"Workspace\"] = ws.name\n",
    "output[\"SKU\"] = ws.sku\n",
    "output[\"Resource Group\"] = ws.resource_group\n",
    "output[\"Location\"] = ws.location\n",
    "output[\"Run History Name\"] = experiment_name\n",
    "pd.set_option(\"display.max_colwidth\", -1)\n",
    "outputDf = pd.DataFrame(data=output, index=[\"\"])\n",
    "outputDf.T"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Data Preparation\n",
    "\n",
    "We need to download the Orange Juice data and split it into training and test sets. By default, the following cell will download and spit the data. If you've already done so, you may skip this part by switching `DOWNLOAD_SPLIT_DATA` to `False`.\n",
    "\n",
    "We store the training data and test data using dataframes. The training data includes `train_df` and `aux_df` with `train_df` containing the historical sales up to week 135 (the time we make forecasts) and `aux_df` containing price/promotion information up until week 138. We assume that future price and promotion information up to a certain number of weeks ahead is predetermined and known. The test data is stored in `test_df` which contains the sales of each product in week 137 and 138. Assuming the current week is week 135, our goal is to forecast the sales in week 137 and 138 using the training data. There is a one-week gap between the current week and the first target week of forecasting as we want to leave time for planning inventory in practice."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Data download and split"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 9,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Data already exists at the specified location.\n"
     ]
    }
   ],
   "source": [
    "if DOWNLOAD_SPLIT_DATA:\n",
    "    download_ojdata(DATA_DIR)\n",
    "    df = pd.read_csv(os.path.join(DATA_DIR, \"yx.csv\"))\n",
    "    df = df.loc[df.week <= LAST_WEEK]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 10,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Convert logarithm of the unit sales to unit sales\n",
    "df[\"move\"] = df[\"logmove\"].apply(lambda x: round(math.exp(x)))\n",
    "# Add timestamp column\n",
    "df[\"week_start\"] = df[\"week\"].apply(lambda x: FIRST_WEEK_START + datetime.timedelta(days=(x - 1) * 7))\n",
    "# Select a subset of stores for demo purpose\n",
    "df_sub = df[df.store.isin(USE_STORES)]"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 11,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Split data into training and test sets\n",
    "def split_last_n_by_grain(df, n):\n",
    "    \"\"\"Group df by grain and split on last n rows for each group.\"\"\"\n",
    "    df_grouped = df.sort_values(time_column_name).groupby(  # Sort by ascending time\n",
    "        grain_column_names, group_keys=False\n",
    "    )\n",
    "    df_head = df_grouped.apply(lambda dfg: dfg.iloc[:-n])\n",
    "    df_tail = df_grouped.apply(lambda dfg: dfg.iloc[-n:])\n",
    "    return df_head, df_tail\n",
    "\n",
    "\n",
    "train_df, test_df = split_last_n_by_grain(df_sub, NUM_TEST_PERIODS)\n",
    "train_df.reset_index(drop=True)\n",
    "test_df.reset_index(drop=True)\n",
    "\n",
    "# Save data locally\n",
    "local_data_pathes = [\n",
    "    os.path.join(DATA_DIR, \"train.csv\"),\n",
    "    os.path.join(DATA_DIR, \"test.csv\"),\n",
    "]\n",
    "\n",
    "train_df.to_csv(local_data_pathes[0], index=None, header=True)\n",
    "test_df.to_csv(local_data_pathes[1], index=None, header=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Upload data to datastore\n",
    "\n",
    "The [Machine Learning service workspace](https://docs.microsoft.com/en-us/azure/machine-learning/service/concept-workspace), is paired with the storage account, which contains the default data store. We will use it to upload the train and test data and create [tabular datasets](https://docs.microsoft.com/en-us/python/api/azureml-core/azureml.data.tabulardataset?view=azure-ml-py) for training and testing. A tabular dataset defines a series of lazily-evaluated, immutable operations to load data from the data source into tabular representation.\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 12,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "Uploading an estimated of 2 files\n",
      "Uploading /data/home/chenhui/work/forecasting/ojdata/test.csv\n",
      "Uploading /data/home/chenhui/work/forecasting/ojdata/train.csv\n",
      "Uploaded /data/home/chenhui/work/forecasting/ojdata/test.csv, 1 files out of an estimated total of 2\n",
      "Uploaded /data/home/chenhui/work/forecasting/ojdata/train.csv, 2 files out of an estimated total of 2\n",
      "Uploaded 2 files\n"
     ]
    },
    {
     "data": {
      "text/plain": [
       "$AZUREML_DATAREFERENCE_1f003008a69b4030b4c6165a27ca7f24"
      ]
     },
     "execution_count": 12,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "datastore = ws.get_default_datastore()\n",
    "datastore.upload_files(files=local_data_pathes, target_path=\"dataset/\", overwrite=True, show_progress=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Create dataset for training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 13,
   "metadata": {},
   "outputs": [],
   "source": [
    "train_dataset = Dataset.Tabular.from_delimited_files(path=datastore.path(\"dataset/train.csv\"))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 14,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store</th>\n",
       "      <th>brand</th>\n",
       "      <th>week</th>\n",
       "      <th>logmove</th>\n",
       "      <th>constant</th>\n",
       "      <th>price1</th>\n",
       "      <th>price2</th>\n",
       "      <th>price3</th>\n",
       "      <th>price4</th>\n",
       "      <th>price5</th>\n",
       "      <th>...</th>\n",
       "      <th>price7</th>\n",
       "      <th>price8</th>\n",
       "      <th>price9</th>\n",
       "      <th>price10</th>\n",
       "      <th>price11</th>\n",
       "      <th>deal</th>\n",
       "      <th>feat</th>\n",
       "      <th>profit</th>\n",
       "      <th>move</th>\n",
       "      <th>week_start</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>2976</th>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>131</td>\n",
       "      <td>10.40</td>\n",
       "      <td>1</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>...</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>5.52</td>\n",
       "      <td>33024</td>\n",
       "      <td>1992-03-12</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2977</th>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>132</td>\n",
       "      <td>10.39</td>\n",
       "      <td>1</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>1</td>\n",
       "      <td>1.00</td>\n",
       "      <td>5.48</td>\n",
       "      <td>32384</td>\n",
       "      <td>1992-03-19</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2978</th>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>133</td>\n",
       "      <td>9.37</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>5.38</td>\n",
       "      <td>11776</td>\n",
       "      <td>1992-03-26</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2979</th>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>134</td>\n",
       "      <td>9.34</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>...</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>7.16</td>\n",
       "      <td>11392</td>\n",
       "      <td>1992-04-02</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2980</th>\n",
       "      <td>8</td>\n",
       "      <td>11</td>\n",
       "      <td>135</td>\n",
       "      <td>10.51</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>...</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.02</td>\n",
       "      <td>1</td>\n",
       "      <td>1.00</td>\n",
       "      <td>8.29</td>\n",
       "      <td>36864</td>\n",
       "      <td>1992-04-09</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 21 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "      store  brand  week  logmove  constant  price1  price2  price3  price4  \\\n",
       "2976  8      11     131  10.40     1        0.03    0.04    0.04    0.03      \n",
       "2977  8      11     132  10.39     1        0.03    0.04    0.04    0.04      \n",
       "2978  8      11     133  9.37      1        0.05    0.04    0.04    0.03      \n",
       "2979  8      11     134  9.34      1        0.04    0.04    0.04    0.03      \n",
       "2980  8      11     135  10.51     1        0.04    0.04    0.04    0.04      \n",
       "\n",
       "      price5     ...      price7  price8  price9  price10  price11  deal  \\\n",
       "2976 0.03        ...     0.04    0.03    0.02    0.02     0.02      0      \n",
       "2977 0.03        ...     0.03    0.03    0.02    0.02     0.02      1      \n",
       "2978 0.04        ...     0.03    0.03    0.02    0.02     0.02      0      \n",
       "2979 0.03        ...     0.04    0.03    0.02    0.02     0.02      0      \n",
       "2980 0.03        ...     0.04    0.03    0.03    0.02     0.02      1      \n",
       "\n",
       "      feat  profit   move  week_start  \n",
       "2976 0.00  5.52     33024 1992-03-12   \n",
       "2977 1.00  5.48     32384 1992-03-19   \n",
       "2978 0.00  5.38     11776 1992-03-26   \n",
       "2979 0.00  7.16     11392 1992-04-02   \n",
       "2980 1.00  8.29     36864 1992-04-09   \n",
       "\n",
       "[5 rows x 21 columns]"
      ]
     },
     "execution_count": 14,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "train_dataset.to_pandas_dataframe().tail()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Modeling\n",
    "\n",
    "For forecasting tasks, AutoML uses pre-processing and estimation steps that are specific to time-series. AutoML will undertake the following pre-processing steps:\n",
    "* Detect time-series sample frequency (e.g. hourly, daily, weekly) and create new records for absent time points to make the series regular. A regular time series has a well-defined frequency and has a value at every sample point in a contiguous time span\n",
    "* Impute missing values in the target (via forward-fill) and feature columns (using median column values)\n",
    "* Create grain-based features to enable fixed effects across different series\n",
    "* Create time-based features to assist in learning seasonal patterns\n",
    "* Encode categorical variables to numeric quantities\n",
    "\n",
    "In this notebook, AutoML will train a single, regression-type model across all time-series in a given training set. This allows the model to generalize across related series. To create a training job, we use AutoML Config object to define the settings and data. Here is a summary of the meanings of the AutoMLConfig parameters:\n",
    "\n",
    "|Property|Description|\n",
    "|-|-|\n",
    "|**task**|forecasting|\n",
    "|**primary_metric**|This is the metric that you want to optimize.<br> Forecasting supports the following primary metrics <br><i>spearman_correlation</i><br><i>normalized_root_mean_squared_error</i><br><i>r2_score</i><br><i>normalized_mean_absolute_error</i>\n",
    "|**experiment_timeout_hours**|Experimentation timeout in hours.|\n",
    "|**enable_early_stopping**|If early stopping is on, training will stop when the primary metric is no longer improving.|\n",
    "|**training_data**|Input dataset, containing both features and label column.|\n",
    "|**label_column_name**|The name of the label column.|\n",
    "|**compute_target**|The remote compute for training.|\n",
    "|**n_cross_validations**|Number of cross-validation folds to use for model/pipeline selection|\n",
    "|**enable_voting_ensemble**|Allow AutoML to create a Voting ensemble of the best performing models|\n",
    "|**enable_stack_ensemble**|Allow AutoML to create a Stack ensemble of the best performing models|\n",
    "|**debug_log**|Log file path for writing debugging information|\n",
    "|**time_column_name**|Name of the datetime column in the input data|\n",
    "|**grain_column_names**|Name(s) of the columns defining individual series in the input data|\n",
    "|**drop_column_names**|Name(s) of columns to drop prior to modeling|\n",
    "|**max_horizon**|Maximum desired forecast horizon in units of time-series frequency|"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Model training"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 15,
   "metadata": {},
   "outputs": [],
   "source": [
    "time_series_settings = {\n",
    "    \"time_column_name\": time_column_name,\n",
    "    \"grain_column_names\": grain_column_names,\n",
    "    \"drop_column_names\": [\"logmove\"],  # 'logmove' is a leaky feature, so we remove it.\n",
    "    \"max_horizon\": NUM_TEST_PERIODS,\n",
    "}\n",
    "\n",
    "automl_config = AutoMLConfig(\n",
    "    task=\"forecasting\",\n",
    "    debug_log=\"automl_oj_sales_errors.log\",\n",
    "    primary_metric=\"normalized_mean_absolute_error\",\n",
    "    experiment_timeout_hours=1.0,  # You may increase this number to improve model accuracy\n",
    "    training_data=train_dataset,\n",
    "    label_column_name=target_column_name,\n",
    "    compute_target=cpu_cluster,\n",
    "    enable_early_stopping=True,\n",
    "    n_cross_validations=3,\n",
    "    verbosity=logging.INFO,\n",
    "    **time_series_settings\n",
    ")"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 16,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<table style=\"width:100%\"><tr><th>Experiment</th><th>Id</th><th>Type</th><th>Status</th><th>Details Page</th><th>Docs Page</th></tr><tr><td>automl-ojforecasting</td><td>AutoML_45710381-d3fc-47c9-816d-9874c41b5355</td><td>automl</td><td>Starting</td><td><a href=\"https://ml.azure.com/experiments/automl-ojforecasting/runs/AutoML_45710381-d3fc-47c9-816d-9874c41b5355?wsid=/subscriptions/9086b59a-02d7-4687-b3fd-e39fa5e0fd9b/resourcegroups/chhamlwsrg/workspaces/chhamlws\" target=\"_blank\" rel=\"noopener\">Link to Azure Machine Learning studio</a></td><td><a href=\"https://docs.microsoft.com/en-us/python/api/overview/azure/ml/intro?view=azure-ml-py\" target=\"_blank\" rel=\"noopener\">Link to Documentation</a></td></tr></table>"
      ],
      "text/plain": [
       "Run(Experiment: automl-ojforecasting,\n",
       "Id: AutoML_45710381-d3fc-47c9-816d-9874c41b5355,\n",
       "Type: automl,\n",
       "Status: Starting)"
      ]
     },
     "execution_count": 16,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "remote_run = experiment.submit(automl_config, show_output=False)\n",
    "remote_run"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "metadata": {},
   "outputs": [],
   "source": [
    "remote_run.wait_for_completion()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Retrieve the best model\n",
    "\n",
    "Each run within an Experiment stores serialized (i.e. pickled) pipelines from the AutoML iterations. After the training job is done, we can retrieve the pipeline with the best performance on the validation dataset."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 18,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[('timeseriestransformer', TimeSeriesTransformer(logger=None,\n",
      "           pipeline_type=<TimeSeriesPipelineType.FULL: 1>)), ('MinMaxScaler', MinMaxScaler(copy=True, feature_range=(0, 1))), ('GradientBoostingRegressor', GradientBoostingRegressor(alpha=0.9, criterion='mse', init=None,\n",
      "             learning_rate=0.1, loss='huber', max_depth=10,\n",
      "             max_features='sqrt', max_leaf_nodes=None,\n",
      "             min_impurity_decrease=0.0, min_impurity_split=None,\n",
      "             min_samples_leaf=0.15874989977926784,\n",
      "             min_samples_split=0.10734188827013527,\n",
      "             min_weight_fraction_leaf=0.0, n_estimators=50,\n",
      "             n_iter_no_change=None, presort='auto', random_state=None,\n",
      "             subsample=0.95, tol=0.0001, validation_fraction=0.1,\n",
      "             verbose=0, warm_start=False))]\n"
     ]
    }
   ],
   "source": [
    "best_run, fitted_model = remote_run.get_output()\n",
    "print(fitted_model.steps)\n",
    "model_name = best_run.properties[\"model_name\"]"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Forecasting\n",
    "\n",
    "Now that we have retrieved the best model pipeline, we can apply it to generate forecasts for the target weeks. To do this, we first remove the target values from the test set"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Generate forecasts"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 19,
   "metadata": {},
   "outputs": [],
   "source": [
    "X_test = test_df\n",
    "y_test = X_test.pop(target_column_name).values"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 20,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>store</th>\n",
       "      <th>brand</th>\n",
       "      <th>week</th>\n",
       "      <th>logmove</th>\n",
       "      <th>constant</th>\n",
       "      <th>price1</th>\n",
       "      <th>price2</th>\n",
       "      <th>price3</th>\n",
       "      <th>price4</th>\n",
       "      <th>price5</th>\n",
       "      <th>price6</th>\n",
       "      <th>price7</th>\n",
       "      <th>price8</th>\n",
       "      <th>price9</th>\n",
       "      <th>price10</th>\n",
       "      <th>price11</th>\n",
       "      <th>deal</th>\n",
       "      <th>feat</th>\n",
       "      <th>profit</th>\n",
       "      <th>week_start</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>85</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>136</td>\n",
       "      <td>8.59</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>33.54</td>\n",
       "      <td>1992-04-16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>86</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>137</td>\n",
       "      <td>9.19</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>20.43</td>\n",
       "      <td>1992-04-23</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>87</th>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>138</td>\n",
       "      <td>9.74</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.03</td>\n",
       "      <td>1</td>\n",
       "      <td>1.00</td>\n",
       "      <td>11.29</td>\n",
       "      <td>1992-04-30</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>195</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>136</td>\n",
       "      <td>9.14</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>1</td>\n",
       "      <td>0.00</td>\n",
       "      <td>27.13</td>\n",
       "      <td>1992-04-16</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>196</th>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>137</td>\n",
       "      <td>8.74</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>33.30</td>\n",
       "      <td>1992-04-23</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "</div>"
      ],
      "text/plain": [
       "     store  brand  week  logmove  constant  price1  price2  price3  price4  \\\n",
       "85   2      1      136  8.59      1        0.05    0.05    0.05    0.05      \n",
       "86   2      1      137  9.19      1        0.04    0.05    0.05    0.04      \n",
       "87   2      1      138  9.74      1        0.04    0.04    0.05    0.04      \n",
       "195  2      2      136  9.14      1        0.05    0.05    0.05    0.05      \n",
       "196  2      2      137  8.74      1        0.04    0.05    0.05    0.04      \n",
       "\n",
       "     price5  price6  price7  price8  price9  price10  price11  deal  feat  \\\n",
       "85  0.04    0.05    0.03    0.04    0.03    0.02     0.03      0    0.00    \n",
       "86  0.03    0.04    0.03    0.04    0.04    0.02     0.03      0    0.00    \n",
       "87  0.04    0.05    0.04    0.04    0.04    0.03     0.03      1    1.00    \n",
       "195 0.04    0.05    0.03    0.04    0.03    0.02     0.03      1    0.00    \n",
       "196 0.03    0.04    0.03    0.04    0.04    0.02     0.03      0    0.00    \n",
       "\n",
       "     profit week_start  \n",
       "85  33.54   1992-04-16  \n",
       "86  20.43   1992-04-23  \n",
       "87  11.29   1992-04-30  \n",
       "195 27.13   1992-04-16  \n",
       "196 33.30   1992-04-23  "
      ]
     },
     "execution_count": 20,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "X_test.head()"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 21,
   "metadata": {},
   "outputs": [],
   "source": [
    "# The featurized data, aligned to y, will also be returned. It contains the assumptions\n",
    "# that were made in the forecast and helps align the forecast to the original data.\n",
    "y_predictions, X_trans = fitted_model.forecast(X_test)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We need to align the output explicitly to the input, as the count and order of the rows may have changed during transformations that span multiple rows."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 22,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>week_start</th>\n",
       "      <th>store</th>\n",
       "      <th>brand</th>\n",
       "      <th>predicted</th>\n",
       "      <th>week</th>\n",
       "      <th>logmove</th>\n",
       "      <th>constant</th>\n",
       "      <th>price1</th>\n",
       "      <th>price2</th>\n",
       "      <th>price3</th>\n",
       "      <th>...</th>\n",
       "      <th>price6</th>\n",
       "      <th>price7</th>\n",
       "      <th>price8</th>\n",
       "      <th>price9</th>\n",
       "      <th>price10</th>\n",
       "      <th>price11</th>\n",
       "      <th>deal</th>\n",
       "      <th>feat</th>\n",
       "      <th>profit</th>\n",
       "      <th>move</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>4075.57</td>\n",
       "      <td>136</td>\n",
       "      <td>8.59</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>33.54</td>\n",
       "      <td>5376</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>7212.49</td>\n",
       "      <td>136</td>\n",
       "      <td>9.14</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>1</td>\n",
       "      <td>0.00</td>\n",
       "      <td>27.13</td>\n",
       "      <td>9312</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>2</td>\n",
       "      <td>3</td>\n",
       "      <td>4075.57</td>\n",
       "      <td>136</td>\n",
       "      <td>7.85</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>32.55</td>\n",
       "      <td>2560</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>2</td>\n",
       "      <td>4</td>\n",
       "      <td>4011.43</td>\n",
       "      <td>136</td>\n",
       "      <td>7.42</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>34.98</td>\n",
       "      <td>1664</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>2</td>\n",
       "      <td>5</td>\n",
       "      <td>4336.83</td>\n",
       "      <td>136</td>\n",
       "      <td>8.59</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0.02</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>28.80</td>\n",
       "      <td>5376</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 22 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  week_start  store  brand  predicted  week  logmove  constant  price1  \\\n",
       "0 1992-04-16  2      1     4075.57     136  8.59      1        0.05      \n",
       "1 1992-04-16  2      2     7212.49     136  9.14      1        0.05      \n",
       "2 1992-04-16  2      3     4075.57     136  7.85      1        0.05      \n",
       "3 1992-04-16  2      4     4011.43     136  7.42      1        0.05      \n",
       "4 1992-04-16  2      5     4336.83     136  8.59      1        0.05      \n",
       "\n",
       "   price2  price3  ...   price6  price7  price8  price9  price10  price11  \\\n",
       "0 0.05    0.05     ...  0.05    0.03    0.04    0.03    0.02     0.03       \n",
       "1 0.05    0.05     ...  0.05    0.03    0.04    0.03    0.02     0.03       \n",
       "2 0.05    0.05     ...  0.05    0.03    0.04    0.03    0.02     0.03       \n",
       "3 0.05    0.05     ...  0.05    0.03    0.04    0.03    0.02     0.03       \n",
       "4 0.05    0.05     ...  0.05    0.03    0.04    0.03    0.02     0.03       \n",
       "\n",
       "   deal  feat  profit  move  \n",
       "0  0    0.00  33.54    5376  \n",
       "1  1    0.00  27.13    9312  \n",
       "2  0    0.00  32.55    2560  \n",
       "3  0    0.00  34.98    1664  \n",
       "4  0    0.00  28.80    5376  \n",
       "\n",
       "[5 rows x 22 columns]"
      ]
     },
     "execution_count": 22,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "pred_automl = align_outputs(y_predictions, X_trans, X_test, y_test, target_column_name)\n",
    "pred_automl.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Results evaluation & visualization"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 23,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "[Test data scores]\n",
      "\n",
      "explained_variance:   0.187\n",
      "r2_score:   0.172\n",
      "spearman_correlation:   0.703\n",
      "mean_absolute_percentage_error:   117.345\n",
      "mean_absolute_error:   6624.722\n",
      "normalized_mean_absolute_error:   0.046\n",
      "median_absolute_error:   3048.760\n",
      "normalized_median_absolute_error:   0.021\n",
      "root_mean_squared_error:   16663.119\n",
      "normalized_root_mean_squared_error:   0.115\n",
      "root_mean_squared_log_error:   0.890\n",
      "normalized_root_mean_squared_log_error:   0.140\n"
     ]
    },
    {
     "data": {
      "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYkAAAD4CAYAAAAZ1BptAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjIsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy8li6FKAAAgAElEQVR4nO3dfXQc1Znn8e8jWcaIF78nAcuWTOLNBvAEbK1DgEkEAtuQgD2zzDnOCDBvowDJLMzLBnO0B+NkdCYkcxYPE0yiBYOxewOEMIMheB1jYCYzJxhkCIiXEAuwhIDEwjaaDB7AL8/+UVdySeqSulstqSX9Puf06eqnblVdXUn9dN17u8rcHRERkXSKhrsCIiJSuJQkREQkkZKEiIgkUpIQEZFEShIiIpJo3HBXIN+mTZvmFRUVw10NEZERZfv27e+5+/Se8VGXJCoqKmhsbBzuaoiIjChm1pIuru4mERFJpCQhIiKJRl13U0/79++nra2NDz/8cLirUjAmTJhAWVkZJSUlw10VESlwoz5JtLW1ccwxx1BRUYGZDXd1hp27s3v3btra2pg9e/ZwV0dECtyo72768MMPmTp1qhJEYGZMnTpVZ1YikpFRnyQAJYge1B4ikqkxkSRGgqqqKgC++93v8vbbb6cts3bt2q7l66+/noMHDw5F1USkAKWaUlSsrqBoVREVqytINaUG5ThKEkAqBRUVUFQUPafy3NaHDh3KuOyKFSuYMWNG2nXxJLF69WqKi4sHXDcRGXlSTSlqH6mlpaMFx2npaKH2kdpBSRRjPkmkUlBbCy0t4B4919ZmnyieeuopLrjgAs477zzOPvtsnnvuOc466ywuuugi7rnnHrZt20ZVVRVnnHEGd999NwCPPvoo8+fP5+tf/zoHDhwA4LLLLqO5uZkPPviAiy66iC9/+ctcfvnlbNy4kaamJqqqqtiyZQtVVVUcOHCA1tZWzj77bM444wxuueUWAG6++WauuuoqzjnnHK666qq8tpeIDL+6rXXs27+vW2zf/n3Uba3L+7FG/eym/tTVwb7ubc2+fVG8pia7fX344Yds2bKF+++/n5///Ofs2rWLxx9/nOLiYhYtWsTGjRs55phjOPfcc6mpqeFv//Zv+ed//mf27t3LWWed1W1fDQ0NLFy4kNraWg4dOkRRURFz587lqaeeAqC+vh6AW265hVWrVvGHf/iHLFq0iEsuuQSAk046iTvvvJOFCxfy/vvvM2nSpJzaR0QKT2tHa1bxgRjzZxKtCW2aFO/LqaeeCsApp5zC448/zuc///muLqEXXniBCy+8kLPOOovf/va3tLe3U1RUxNFHH83MmTOZPr37JVN+85vfcPrppwNQVJT8a3r99deZN29e1/HffPNNAE4++WQAjj/+eDo6OrL/YUSkYM2aOCur+ECM+SQxK6FNk+J9eeGFF7qeq6uru725n3rqqfzsZz/jqaee4vnnn2fGjBkcOnSIDz74gLa2Ntrb27vt67Of/SxPP/00cHhMI92spBNOOIHt27cD8Pzzz9N5ccN4Wd2iVmR0qa+up7SktFustKSU+ur6vB+r3yRhZmvNbJeZvZRm3V+bmZvZtPDazOw2M2s2sxfNbF6s7HIz2xEey2Px+WbWFLa5zcK7m5lNMbMtofwWM5ucnx+5u/p6KO3e1pSWRvFslZSUsHjxYtasWcPChQu7rVu1alXXmcSyZcsAuOGGG/jSl77EqlWr+NSnPtWt/J/92Z+xadMmvvzlL3eNKyxYsIClS5fyi1/8oqvcDTfcwE033cTpp59OVVVV4qC3iIweNXNraLiggfKJ5RhG+cRyGi5ooGZuln3kmXD3Ph/Al4B5wEs94jOBzUALMC3Ezgc2AQacBmwL8SnAG+F5clieHNY9A3wxbLMJOC/EvwesCMsrgFv6q6u7M3/+fI975ZVXvD8bNriXl7ubRc8bNvS7SS9PPvmk19XVZb/hMMmkXURk7AAaPc17ar9nEu7+L8CeNKtuBb4FxPsylgD3hmM+DUwys+OARcAWd9/j7nuBLcDisO5Yd/9lqOS9wNLYvtaF5XWxeN7V1MDOnXDoUPSc7YC1iMholdPsJjO7EHjb3V/o0U8+A3gr9rotxPqKt6WJA3zS3d8FcPd3zewTfdSnFqgFmJXLYEIeVFVVdX0hTkRktMh64NrMSoE64KZ0q9PEPId4Vty9wd0r3b2y5ywhERHJXS6zmz4NzAZeMLOdQBnwnJl9iuhMYGasbBnwTj/xsjRxgN+F7ijC864c6ioiIgOQdZJw9yZ3/4S7V7h7BdEb/Tx3/y2wEbg0zHI6DegIXUabgYVmNjnMUloIbA7rfm9mp4VZTZcCD4dDbQQ6Z0Etj8VFRGSIZDIF9sfAL4HPmlmbmV3ZR/HHiGYuNQP/B7gWwN33AN8Bng2Pb4cYwDXAnWGb14lmOAF8FzjXzHYA54bXBWvnzp088cQTGZW95557ur770HkZDhGRQpTJ7Kavuftx7l7i7mXufleP9RXu/l5Ydnf/hrt/2t3nuntjrNxad/9MeNwdize6+8lhm2+GWU64+253r3b3OeE53QyrvMjH1RTTJYmkC/vFk4SISCEb89+4ztfVFBsaGli/fj3V1dXdLux35plndpWpqqrimWee4Ve/+hXV1dWsX78egH/4h3/gzDPPZNWqVXn92UREBmrMJ4l8XU2xtraWSy65hLvuuotdu3Zx//33c8UVV/Qqt2DBAk455RS2bt3adTG+qqoq/vVf/5XHHnss9x9ERGQQjPkkMRhXU4xf2K+T93H9pM6L8R155JE5H1NEZDCM+SSRr6splpSUdN0pLn5hP3fno48+oqmpKW1Z0O1ERaRwjfkkka+rKZ588sn827/9GzfccEO3+GWXXcaZZ57JT37yk67YV77yFZYuXcpPf/rT3CsuIjIErK9ukJGosrLSGxu7JlXx6quv8rnPfa7PbVJNKeq21tHa0cqsibOor64fnKspFpBM2kVExg4z2+7ulT3jY/7OdBBddne0JwURkVyM+e4mERFJNiaSxGjrUhsotYeIZGrUJ4kJEyawe/duvTEG7s7u3buZMGHCcFdFREaAUT8mUVZWlvYe0mPZhAkTKCsr67+giIx5oz5JlJSUMHv27OGuhojIiDTqu5tERCR3ShIiIpJISUJERBIpSYiISCIlCRERSaQkISIiiZQkREQkUb9JwszWmtkuM3spFvu+mf3azF40s380s0mxdTeaWbOZvWZmi2LxxSHWbGYrYvHZZrbNzHaY2f1mNj7Ejwivm8P6inz90CIikplMziTuARb3iG0BTnb3PwB+A9wIYGYnAsuAk8I2a8ys2MyKgduB84ATga+FsgC3ALe6+xxgL3BliF8J7HX3zwC3hnIiIjKE+k0S7v4vwJ4esZ+7+4Hw8mmg8xoPS4D73P0jd38TaAYWhEezu7/h7h8D9wFLLLol29nAg2H7dcDS2L7WheUHgWrTLdxERIZUPsYkrgA2heUZwFuxdW0hlhSfCrwfSzid8W77Cus7QvlezKzWzBrNrFHXaBIRyZ8BJQkzqwMOAKnOUJpinkO8r331Dro3uHulu1dOnz6970qLiEjGcr7An5ktB74KVPvh63C3ATNjxcqAd8Jyuvh7wCQzGxfOFuLlO/fVZmbjgIn06PYSEZHBldOZhJktBm4ALnT3fbFVG4FlYWbSbGAO8AzwLDAnzGQaTzS4vTEklyeBi8L2y4GHY/taHpYvAp5w3RRCRGRI9XsmYWY/BqqAaWbWBqwkms10BLAljCU/7e5Xu/vLZvYA8ApRN9Q33P1g2M83gc1AMbDW3V8Oh7gBuM/M/gZ4HrgrxO8C1ptZM9EZxLI8/LwiIpIFG20fzisrK72xsXG4qyEiMqKY2XZ3r+wZ1zeuRUQkkZKEiIgkUpIQEZFEShIiIpJISUJERBIpSYiISCIlCRERSaQkISIiiZQkREQkkZKEiIgkUpIQEZFEShIiIpJISUJERBIpSYiISCIlCRERSaQkISIiiZQkREQkkZKEiIgk6jdJmNlaM9tlZi/FYlPMbIuZ7QjPk0PczOw2M2s2sxfNbF5sm+Wh/A4zWx6LzzezprDNbRZump10DBERGTqZnEncAyzuEVsBbHX3OcDW8BrgPGBOeNQCd0D0hg+sBL4ALABWxt707whlO7db3M8xRERkiPSbJNz9X4A9PcJLgHVheR2wNBa/1yNPA5PM7DhgEbDF3fe4+15gC7A4rDvW3X/p7g7c22Nf6Y4hIiJDJNcxiU+6+7sA4fkTIT4DeCtWri3E+oq3pYn3dYxezKzWzBrNrLG9vT3HH0lERHrK98C1pYl5DvGsuHuDu1e6e+X06dOz3VxERBLkmiR+F7qKCM+7QrwNmBkrVwa800+8LE28r2OIiMgQyTVJbAQ6ZygtBx6OxS8Ns5xOAzpCV9FmYKGZTQ4D1guBzWHd783stDCr6dIe+0p3DBERGSLj+itgZj8GqoBpZtZGNEvpu8ADZnYl0Ar8SSj+GHA+0AzsAy4HcPc9ZvYd4NlQ7tvu3jkYfg3RDKojgU3hQR/HEBGRIWLRpKLRo7Ky0hsbG4e7GiIiI4qZbXf3yp5xfeNaREQSKUmIiEgiJQkREUmkJCEiIomUJEREJJGShIiIJFKSEBGRREoSIiKSSElCREQSKUmIiEgiJQkREUmkJCEiIomUJEREJJGShIiIJFKSEBGRREoSIiKSSElCREQSKUmIiEiiASUJM/sLM3vZzF4ysx+b2QQzm21m28xsh5ndb2bjQ9kjwuvmsL4itp8bQ/w1M1sUiy8OsWYzWzGQuoqISPZyThJmNgP4H0Clu58MFAPLgFuAW919DrAXuDJsciWw190/A9waymFmJ4btTgIWA2vMrNjMioHbgfOAE4GvhbIiIjJEBtrdNA440szGAaXAu8DZwINh/TpgaVheEl4T1lebmYX4fe7+kbu/CTQDC8Kj2d3fcPePgftCWRERGSI5Jwl3fxv4O6CVKDl0ANuB9939QCjWBswIyzOAt8K2B0L5qfF4j22S4r2YWa2ZNZpZY3t7e64/koiI9DCQ7qbJRJ/sZwPHA0cRdQ315J2bJKzLNt476N7g7pXuXjl9+vT+qi4iIhkaSHfTOcCb7t7u7vuBh4DTgUmh+wmgDHgnLLcBMwHC+onAnni8xzZJcRERGSIDSRKtwGlmVhrGFqqBV4AngYtCmeXAw2F5Y3hNWP+Eu3uILwuzn2YDc4BngGeBOWG21Hiiwe2NA6iviIhkaVz/RdJz921m9iDwHHAAeB5oAH4G3GdmfxNid4VN7gLWm1kz0RnEsrCfl83sAaIEcwD4hrsfBDCzbwKbiWZOrXX3l3Otr4iIZM+iD/OjR2VlpTc2Ng53NURERhQz2+7ulT3j+sa1iIgkUpIQEZFEShIiIpJISUJERBIpSYgMglRTiorVFRStKqJidQWpptRwV0kkJzlPgRWR9FJNKWofqWXf/n0AtHS0UPtILQA1c2uGs2oiWdOZhEie1W2t60oQnfbt30fd1rphqpFI7pQkRPKstaM1q7hIIVOSEMlR0rjDrImz0pZPiosUMiUJkRx0jju0dLTgeNe4Q6opRX11PaUlpd3Kl5aUUl9dP0y1FcmdkoRIDvoad6iZW0PDBQ2UTyzHMMonltNwQYMGrWVE0uwmkRz0N+5QM7dGSUFGBZ1JiORA4w4yVihJiORA4w4yVihJiORA4w4yVuh+EiIiovtJiIhI9pQkREQk0YCShJlNMrMHzezXZvaqmX3RzKaY2RYz2xGeJ4eyZma3mVmzmb1oZvNi+1keyu8ws+Wx+Hwzawrb3GZmNpD6iohIdgZ6JvH3wP9z9/8KfB54FVgBbHX3OcDW8BrgPGBOeNQCdwCY2RRgJfAFYAGwsjOxhDK1se0WD7C+IiKShZyThJkdC3wJuAvA3T929/eBJcC6UGwdsDQsLwHu9cjTwCQzOw5YBGxx9z3uvhfYAiwO64519196NLp+b2xfIiIyBAZyJnEC0A7cbWbPm9mdZnYU8El3fxcgPH8ilJ8BvBXbvi3E+oq3pYn3Yma1ZtZoZo3t7e0D+JFERCRuIEliHDAPuMPdTwU+4HDXUjrpxhM8h3jvoHuDu1e6e+X06dP7rrWIiGRsIEmiDWhz923h9YNESeN3oauI8LwrVn5mbPsy4J1+4mVp4iIiMkRyThLu/lvgLTP7bAhVA68AG4HOGUrLgYfD8kbg0jDL6TSgI3RHbQYWmtnkMGC9ENgc1v3ezE4Ls5ouje1LRESGwECvAvvnQMrMxgNvAJcTJZ4HzOxKoBX4k1D2MeB8oBnYF8ri7nvM7DvAs6Hct919T1i+BrgHOBLYFB4iIjJEdFkOERHRZTlERCR7ShIiIpJISUJERBIpSYiISCIlCRERSaQkISIiiZQkREQkkZKEiIgkUpKQgpVqSlGxuoKiVUVUrK4g1ZQa7iqJjDkDvSyHyKBINaWofaSWffv3AdDS0ULtI7UA1MytGc6qiYwpOpOQglS3ta4rQXTat38fdVvrhqlGImOTkoQUpNaO1qziIjI4lCSkIM2aOCuruIgMDiUJGRb9DUrXV9dTWlLaLVZaUkp9df1QVlNkzFOSkCHXOSjd0tGC412D0vFEUTO3hoYLGiifWI5hlE8sp+GCBg1aiwwx3U9ChlzF6gpaOlp6xcsnlrPz+p1DXyER0f0kpHBoUFpk5FCSkCGnQWmRkWPAScLMis3seTN7NLyebWbbzGyHmd0f7n+NmR0RXjeH9RWxfdwY4q+Z2aJYfHGINZvZioHWVQqDBqVFRo58nElcB7wae30LcKu7zwH2AleG+JXAXnf/DHBrKIeZnQgsA04CFgNrQuIpBm4HzgNOBL4WysoIp0FpkZFjQJflMLMy4CtAPfCXZmbA2cCfhiLrgJuBO4AlYRngQeAHofwS4D53/wh408yagQWhXLO7vxGOdV8o+8pA6iyFoWZujZKCyAgw0DOJ1cC3gEPh9VTgfXc/EF63ATPC8gzgLYCwviOU74r32CYpLiIiQyTnJGFmXwV2ufv2eDhNUe9nXbbxdHWpNbNGM2tsb2/vo9YiIpKNgZxJnAFcaGY7gfuIuplWA5PMrLMbqwx4Jyy3ATMBwvqJwJ54vMc2SfFe3L3B3SvdvXL69OkD+JFERCQu5yTh7je6e5m7VxANPD/h7jXAk8BFodhy4OGwvDG8Jqx/wqNv8m0EloXZT7OBOcAzwLPAnDBbanw4xsZc6ysiItkbjPtJ3ADcZ2Z/AzwP3BXidwHrw8D0HqI3fdz9ZTN7gGhA+gDwDXc/CGBm3wQ2A8XAWnd/eRDqKyIiCXRZDslIqilF3dY6WjtamTVxFvXV9ZqdJDKKJF2WQ3emk37pLnEiY5cuyyH90l3iRMYuJQnply7IJzJ2KUlILz1vCDTlyClpy+mCfCKjn8YkpEuqKcV1m65j93/u7oq1dLQwvng8JUUl7D+0vyuuC/KJjA06kxDg8OB0PEF0+vjgxxx7xLG6IJ/IGKQzCQHSD07H7fnPPbz3rfeGsEYiUgh0JiFA/4PQGn8QGZuUJAToOwlo/EFk7FKSECD93eIAph45VeMPImOYxiQEOPzNaV16Q0TidO0mERFJvHaTuptERCSRkoSIiCRSkhARkURKEiIikkhJQkREEilJjGA9r9aaakoNd5VEZJTR9yRGqGt/di0/bPwhTjSFWXeLE5HBkPOZhJnNNLMnzexVM3vZzK4L8SlmtsXMdoTnySFuZnabmTWb2YtmNi+2r+Wh/A4zWx6LzzezprDNbWZmA/lhR4tUU6pbguiku8WJSL4NpLvpAPBX7v454DTgG2Z2IrAC2Oruc4Ct4TXAecCc8KgF7oAoqQArgS8AC4CVnYkllKmNbbd4APUd0VJNKaZ9bxq2yrj4oYt7JYhOuluciORTzknC3d919+fC8u+BV4EZwBJgXSi2DlgalpcA93rkaWCSmR0HLAK2uPsed98LbAEWh3XHuvsvPfpa+L2xfY0p1/7sWi5+6OK093roSVdrFZF8ysvAtZlVAKcC24BPuvu7ECUS4BOh2AzgrdhmbSHWV7wtTTzd8WvNrNHMGtvb2wf64xSUVFOKOxrvyKisYbpaq4jk1YCThJkdDfwUuN7d/72vomlinkO8d9C9wd0r3b1y+vTp/VV5ROjsXrr4oYszKm8YV1derUFrEcmrASUJMyshShApd38ohH8XuooIz7tCvA2YGdu8DHinn3hZmviIkUpBRQUUFUXPqQxnqKaaUlz20BUZdS/hwPvlTHlqPWe8v2YAtT1cXzMYNy56zqbeIjL6DGR2kwF3Aa+6+/+OrdoIdM5QWg48HItfGmY5nQZ0hO6ozcBCM5scBqwXApvDut+b2WnhWJfG9lXwUimorYWWFnCPni+5JPmNN55QLr2njgN83P9BHHhoA6zeye6naqitzf0NPV5fgIMHo+eWlih+7bW5JbyByDXJikgeuXtOD+BMorepF4Ffhcf5wFSiWU07wvOUUN6A24HXgSagMravK4Dm8Lg8Fq8EXgrb/IBwafO+HvPnz/dCUF7uHqWH9I/SUvcNG6KyGza4l8zf4Fxf7qw0ZyXOzf08VuJcXN1rv+Xlg1Nfs+T6D4YNG6JjDOUxs7VhQ9RuZtFzIdVNRr98//0BjZ7mPVX3kxgkRUXRW1tfysth506YdlaK3afXwvh9/e/YAS+CZ78Om3p3L5nBoUODU9+eOus/GCoqDp/VDNUxs9F55rUv9isrLYWGBqipidbX1UFrK8yaBfX1UVwkH/r7+8tF0v0klCQGSdKbXJe5KVh8HRy1O3mYvqcDJfDw3dCU/FeQ65tov/VNI9eElImkpDWYx8xGX0msvj7//8AicYPxIUo3HRoEffWZ19dHbwxpzU3B0uVRgoDkBOGxxwdT+00QpaXRcXPRV32Tvuc+axC/kpG078E8ZjZaE76z2NoanUHs63FSuG9fFBfJh77+/vJNSSJH6Qam4wPHNTXRJ8fy8uh1tzfaxddB8cH+D9JRDqs8enz/vbQJorg42nd5+cA+qfasb3Fx9FxeDldf3TuBDCQhZSJd0hrsY2ajryQ2lP/AMjYN5YcoJYkcZfJpsaYmOvVzh6tvT1H81xWwsghK+5/aOt5Kuea/9P2OWFoK69ZF3S87dw68KyNe3wMHouedO2HNmsMJJB8JCfqfuRRPWvk6Zj71lcQK/SxIRr6h/BClMYkc9TfQO3Vq9Lx7wbXw334IRZm3c/nEcuqr66mZW9PnWME110Rv4CPNYAy6DYekwenR8vNJYcv35AgNXA9A5y8j24Fdrp8BE9/JbFAawGHDf9/Q7VvTqRRcnPCl66lT4b33sqxTASj0mUv5oNlNMtIoSWQplYLrroPdGXzpOa0cEgTPXMOGmjW93kz6ukD6SPz1FfrMJZGxSLObspBKwRVX5JggLj4HVlpmCeJgMbjB++XRN6c3rcl6BsxI/Day+uxFRg7dmS6Nujr4OIOrYvSSzdnDx6XwSEOvGUvpZsBMnZqcsOIzq2BkdGkkfY+gUGYuichhOpNII6upinNT8L9KMj97ADhkaRMEpP80/fd/DyUlfe9yJM3DL/SZSyJymJJEGhl3e1x8DvzxxTDuQJQcMkoQcNTP13PNmTW9xhqSPk3X1MDddx9+U00ykubhd063zdf0XREZHEoSaZx/fgaFrp8Bn96a3cD0QYN/3MCEHTWsWQPr12f+aTr+ptr5hbee1KcvIvmmMYkeUim4884+Clx9EnzylWg5mwTxejVseByAzuGFmprcPkGrT19EhoqSRA9f/zrs35+w8ibLvFsJouRwyOCf1ncbf+i85EWuOhOL5uGLyGBTkohJpeCDD9KsmJuCP7o48wTR+R2AZ65Jeznvgxlctqk/uZ6FiIhkQ0kiptfsoGyTA0QJ4qMj4bvJ94ZIGlMQESk0ShIx3WYHdY49ZJocIEoQHcfD6rcTi4wfr7EDERk5NLsppmt20P+cnF2CcOAQUffS6reprk5/b4ajjoK1a9VNJCIjR8EnCTNbbGavmVmzma0YzGPV1wPXnASl72c/c+nbztG/WMOGDfD4472/LLZhA/zHfyhBiMjIUtDdTWZWDNwOnAu0Ac+a2UZ3f2VQDvgHKWjOcNfhlqPVJ1Tz+M2Pw/ruqzWwLCKjQUEnCWAB0OzubwCY2X3AEmBQkkTd1syva1FSVMLHN+VygScRkZGj0LubZgBvxV63hVg3ZlZrZo1m1tje3p7zwVo7MruuxfFHH68EISJjQqEniXQjA73uRODuDe5e6e6V06dPz/lgsyb2fV2LEivBVzpv/1Xy7CURkdGk0JNEGzAz9roMeGewDlZfXU9pSe9pSeNsHBv+eIPOHkRkzCn0MYlngTlmNht4G1gG/OlgHazztqF1W+to7Whl1sRZXfeaFhEZiwo6Sbj7ATP7JrAZKAbWuvvLg3nMmrk1SgoiIkFBJwkAd38MeGy46yEiMhYV+piEiIgMIyUJERFJpCQhIiKJlCRERCSRuff6btqIZmbtQEsedjUNeC8P+xlMqmN+qI75MxLqqTqmV+7uvb6NPOqSRL6YWaO7Vw53PfqiOuaH6pg/I6GeqmN21N0kIiKJlCRERCSRkkSyhuGuQAZUx/xQHfNnJNRTdcyCxiRERCSRziRERCSRkoSIiCRSkujBzBab2Wtm1mxmK4bgeDPN7Ekze9XMXjaz60J8ipltMbMd4XlyiJuZ3Rbq96KZzYvta3kov8PMlsfi882sKWxzm5mlu5lTJnUtNrPnzezR8Hq2mW0Lx7vfzMaH+BHhdXNYXxHbx40h/pqZLYrFB9zuZjbJzB40s1+H9vxiobWjmf1F+D2/ZGY/NrMJhdCOZrbWzHaZ2Uux2KC3XdIxsqjj98Pv+0Uz+0czm5RrG+Xye8ikjrF1f21mbmbThrMds+bueoQH0eXIXwdOAMYDLwAnDvIxjwPmheVjgN8AJwLfA1aE+ArglrB8PrCJ6K59pwHbQnwK8EZ4nhyWJ4d1zwBfDNtsAs7Lsa5/Cfxf4NHw+gFgWVj+IXBNWL4W+GFYXgbcH5ZPDG16BDA7tHVxvtodWAdcFZbHA5MKqR2Jbr37JnBkrEFpGvwAAAPXSURBVP0uK4R2BL4EzANeisUGve2SjpFFHRcC48LyLbE6Zt1G2f4eMq1jiM8kuuVBCzBtONsx6/+rfO1oNDxC42+Ovb4RuHGI6/AwcC7wGnBciB0HvBaWfwR8LVb+tbD+a8CPYvEfhdhxwK9j8W7lsqhXGbAVOBt4NPyRvhf7B+1qu/DP8MWwPC6Us57t2VkuH+0OHEv0Bmw94gXTjhy+Z/uU0C6PAosKpR2BCrq/AQ962yUdI9M69lj3R0Aq3c/eXxvl8vecTR2BB4HPAzs5nCSGrR2zeai7qbvOf+JObSE2JMJp7KnANuCT7v4uQHj+RD917CveliaerdXAt4BD4fVU4H13P5Bmv111Ces7Qvls656NE4B24G6LusTuNLOjKKB2dPe3gb8DWoF3idplO4XVjnFD0XZJx8jFFUSfrnOpYy5/zxkxswuBt939hR6rCrUdu1GS6C5dH/OQzBE2s6OBnwLXu/u/91U0TcxziGdTt68Cu9x9ewb1GJY6En3Cmwfc4e6nAh8QnXYnGY52nAwsIer+OB44Cjivj/0ORztmouDqZWZ1wAEg1RnKsi65/D1nUq9SoA64Kd3qPNZx0ChJdNdG1HfYqQx4Z7APamYlRAki5e4PhfDvzOy4sP44YFc/dewrXpYmno0zgAvNbCdwH1GX02pgkpl13t0wvt+uuoT1E4E9OdQ9G21Am7tvC68fJEoahdSO5wBvunu7u+8HHgJOp7DaMW4o2i7pGBkLA7tfBWo89LfkUMf3yP73kIlPE30oeCH8/5QBz5nZp3Ko46C2Y6J89VuNhgfRp9E3wi+1c1DrpEE+pgH3Aqt7xL9P94Go74Xlr9B9sOuZEJ9C1Cc/OTzeBKaEdc+Gsp2DXecPoL5VHB64/gndB/quDcvfoPtA3wNh+SS6Dya+QTSQmJd2B34BfDYs3xzasGDaEfgC8DJQGvaxDvjzQmlHeo9JDHrbJR0jizouBl4Bpvcol3UbZft7yLSOPdbt5PCYxLC1Y1Z/t/na0Wh5EM04+A3RDIi6ITjemUSnjC8CvwqP84n6PLcCO8Jz5x+JAbeH+jUBlbF9XQE0h8flsXgl8FLY5gf0MeiWQX2rOJwkTiCabdEc/sGOCPEJ4XVzWH9CbPu6UI/XiM0Oyke7A6cAjaEt/yn8gxVUOwKrgF+H/awnehMb9nYEfkw0TrKf6BPrlUPRdknHyKKOzUT9953/Oz/MtY1y+T1kUsce63dyOEkMSztm+9BlOUREJJHGJEREJJGShIiIJFKSEBGRREoSIiKSSElCREQSKUmIiEgiJQkREUn0/wF4M7THnpv9oQAAAABJRU5ErkJggg==\n",
      "text/plain": [
       "<Figure size 432x288 with 1 Axes>"
      ]
     },
     "metadata": {
      "needs_background": "light"
     },
     "output_type": "display_data"
    }
   ],
   "source": [
    "# Use automl metrics module\n",
    "scores = metrics.compute_metrics_regression(\n",
    "    pred_automl[\"predicted\"],\n",
    "    pred_automl[target_column_name],\n",
    "    list(constants.Metric.SCALAR_REGRESSION_SET),\n",
    "    None,\n",
    "    None,\n",
    "    None,\n",
    ")\n",
    "\n",
    "print(\"[Test data scores]\\n\")\n",
    "for key, value in scores.items():\n",
    "    print(\"{}:   {:.3f}\".format(key, value))\n",
    "\n",
    "# Plot outputs\n",
    "test_pred = plt.scatter(pred_automl[target_column_name], pred_automl[\"predicted\"], color=\"b\")\n",
    "test_test = plt.scatter(pred_automl[target_column_name], pred_automl[target_column_name], color=\"g\")\n",
    "plt.legend((test_pred, test_test), (\"prediction\", \"truth\"), loc=\"upper left\", fontsize=8)\n",
    "plt.show()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "We also compute MAPE of the forecasts in the last two weeks of the forecast period in order to be consistent with the evaluation period that is used in other quick start examples."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 24,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAPE of forecasts obtained by AutoML in the last two weeks: 124.10334037331717\n"
     ]
    }
   ],
   "source": [
    "pred_automl_sub = pred_automl.loc[pred_automl.week >= max(test_df.week) - NUM_TEST_PERIODS + GAP]\n",
    "mape_automl_sub = MAPE(pred_automl_sub[\"predicted\"], pred_automl_sub[\"move\"]) * 100\n",
    "print(\"MAPE of forecasts obtained by AutoML in the last two weeks: \" + str(mape_automl_sub))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Combine AutoML Model with a Custom Model\n",
    "\n",
    "So far we have demonstrated how we can quickly build a forecasting model with AutoML in Azure. Next, we further show a simple way to achieve more robust and accurate forecasts by combining the forecasts from AutoML and a custom model that the user may have. Here we assume that the user have also constructed a series of linear regression models with each model forecasts the sales of a specfic store-brand using `scikit-learn` package."
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Multiple linear regression models"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 25,
   "metadata": {},
   "outputs": [],
   "source": [
    "# Create price features\n",
    "df_sub[\"price\"] = df_sub.apply(lambda x: x.loc[\"price\" + str(int(x.loc[\"brand\"]))], axis=1)\n",
    "price_cols = [\n",
    "    \"price1\",\n",
    "    \"price2\",\n",
    "    \"price3\",\n",
    "    \"price4\",\n",
    "    \"price5\",\n",
    "    \"price6\",\n",
    "    \"price7\",\n",
    "    \"price8\",\n",
    "    \"price9\",\n",
    "    \"price10\",\n",
    "    \"price11\",\n",
    "]\n",
    "df_sub[\"avg_price\"] = df_sub[price_cols].sum(axis=1).apply(lambda x: x / len(price_cols))\n",
    "df_sub[\"price_ratio\"] = df_sub.apply(lambda x: x[\"price\"] / x[\"avg_price\"], axis=1)\n",
    "\n",
    "# Create lag features on unit sales\n",
    "df_sub[\"move_lag1\"] = df_sub[\"move\"].shift(1)\n",
    "df_sub[\"move_lag2\"] = df_sub[\"move\"].shift(2)\n",
    "\n",
    "# Drop rows with NaN values\n",
    "df_sub.dropna(inplace=True)"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "After splitting the data, we use `fit()` and `predit()` functions from `fclib.models.multiple_linear_regression` to train separate linear regression model for each invididual time series and generate forecasts for the sales during the test period."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 26,
   "metadata": {},
   "outputs": [
    {
     "data": {
      "text/html": [
       "<div>\n",
       "<style scoped>\n",
       "    .dataframe tbody tr th:only-of-type {\n",
       "        vertical-align: middle;\n",
       "    }\n",
       "\n",
       "    .dataframe tbody tr th {\n",
       "        vertical-align: top;\n",
       "    }\n",
       "\n",
       "    .dataframe thead th {\n",
       "        text-align: right;\n",
       "    }\n",
       "</style>\n",
       "<table border=\"1\" class=\"dataframe\">\n",
       "  <thead>\n",
       "    <tr style=\"text-align: right;\">\n",
       "      <th></th>\n",
       "      <th>week_start</th>\n",
       "      <th>prediction</th>\n",
       "      <th>store</th>\n",
       "      <th>brand</th>\n",
       "      <th>week</th>\n",
       "      <th>logmove</th>\n",
       "      <th>constant</th>\n",
       "      <th>price1</th>\n",
       "      <th>price2</th>\n",
       "      <th>price3</th>\n",
       "      <th>...</th>\n",
       "      <th>price11</th>\n",
       "      <th>deal</th>\n",
       "      <th>feat</th>\n",
       "      <th>profit</th>\n",
       "      <th>move</th>\n",
       "      <th>price</th>\n",
       "      <th>avg_price</th>\n",
       "      <th>price_ratio</th>\n",
       "      <th>move_lag1</th>\n",
       "      <th>move_lag2</th>\n",
       "    </tr>\n",
       "  </thead>\n",
       "  <tbody>\n",
       "    <tr>\n",
       "      <th>0</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>12507</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>136</td>\n",
       "      <td>8.59</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>33.54</td>\n",
       "      <td>5376</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>1.27</td>\n",
       "      <td>12416.00</td>\n",
       "      <td>28096.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>1</th>\n",
       "      <td>1992-04-23</td>\n",
       "      <td>17664</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>137</td>\n",
       "      <td>9.19</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>20.43</td>\n",
       "      <td>9792</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>1.11</td>\n",
       "      <td>5376.00</td>\n",
       "      <td>12416.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>2</th>\n",
       "      <td>1992-04-30</td>\n",
       "      <td>21670</td>\n",
       "      <td>2</td>\n",
       "      <td>1</td>\n",
       "      <td>138</td>\n",
       "      <td>9.74</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>1</td>\n",
       "      <td>1.00</td>\n",
       "      <td>11.29</td>\n",
       "      <td>16960</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.94</td>\n",
       "      <td>9792.00</td>\n",
       "      <td>5376.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>3</th>\n",
       "      <td>1992-04-16</td>\n",
       "      <td>9551</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>136</td>\n",
       "      <td>9.14</td>\n",
       "      <td>1</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>1</td>\n",
       "      <td>0.00</td>\n",
       "      <td>27.13</td>\n",
       "      <td>9312</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>1.21</td>\n",
       "      <td>11424.00</td>\n",
       "      <td>4992.00</td>\n",
       "    </tr>\n",
       "    <tr>\n",
       "      <th>4</th>\n",
       "      <td>1992-04-23</td>\n",
       "      <td>7452</td>\n",
       "      <td>2</td>\n",
       "      <td>2</td>\n",
       "      <td>137</td>\n",
       "      <td>8.74</td>\n",
       "      <td>1</td>\n",
       "      <td>0.04</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.05</td>\n",
       "      <td>...</td>\n",
       "      <td>0.03</td>\n",
       "      <td>0</td>\n",
       "      <td>0.00</td>\n",
       "      <td>33.30</td>\n",
       "      <td>6240</td>\n",
       "      <td>0.05</td>\n",
       "      <td>0.04</td>\n",
       "      <td>1.39</td>\n",
       "      <td>9312.00</td>\n",
       "      <td>11424.00</td>\n",
       "    </tr>\n",
       "  </tbody>\n",
       "</table>\n",
       "<p>5 rows × 27 columns</p>\n",
       "</div>"
      ],
      "text/plain": [
       "  week_start  prediction  store  brand  week  logmove  constant  price1  \\\n",
       "0 1992-04-16  12507       2      1      136  8.59      1        0.05      \n",
       "1 1992-04-23  17664       2      1      137  9.19      1        0.04      \n",
       "2 1992-04-30  21670       2      1      138  9.74      1        0.04      \n",
       "3 1992-04-16  9551        2      2      136  9.14      1        0.05      \n",
       "4 1992-04-23  7452        2      2      137  8.74      1        0.04      \n",
       "\n",
       "   price2  price3    ...      price11  deal  feat  profit   move  price  \\\n",
       "0 0.05    0.05       ...     0.03      0    0.00  33.54    5376  0.05     \n",
       "1 0.05    0.05       ...     0.03      0    0.00  20.43    9792  0.04     \n",
       "2 0.04    0.05       ...     0.03      1    1.00  11.29    16960 0.04     \n",
       "3 0.05    0.05       ...     0.03      1    0.00  27.13    9312  0.05     \n",
       "4 0.05    0.05       ...     0.03      0    0.00  33.30    6240  0.05     \n",
       "\n",
       "   avg_price  price_ratio  move_lag1  move_lag2  \n",
       "0 0.04       1.27         12416.00   28096.00    \n",
       "1 0.04       1.11         5376.00    12416.00    \n",
       "2 0.04       0.94         9792.00    5376.00     \n",
       "3 0.04       1.21         11424.00   4992.00     \n",
       "4 0.04       1.39         9312.00    11424.00    \n",
       "\n",
       "[5 rows x 27 columns]"
      ]
     },
     "execution_count": 26,
     "metadata": {},
     "output_type": "execute_result"
    }
   ],
   "source": [
    "# Split data into training and test sets\n",
    "train_df, test_df = split_last_n_by_grain(df_sub, NUM_TEST_PERIODS)\n",
    "train_df.reset_index(drop=True)\n",
    "test_df.reset_index(drop=True)\n",
    "\n",
    "# Train multiple linear regression models\n",
    "fea_column_names = [\"move_lag1\", \"move_lag2\", \"price\", \"price_ratio\"]\n",
    "lr_models = fit(train_df, grain_column_names, fea_column_names, target_column_name)\n",
    "\n",
    "# Generate forecasts with the trained models\n",
    "pred_all = predict(test_df, lr_models, time_column_name, grain_column_names, fea_column_names)\n",
    "\n",
    "pred_lr = pd.merge(pred_all, test_df, on=index_column_names)\n",
    "pred_lr.head()"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "Let's check the accuracy of the predictions on the entire forecast period as well as in the last two weeks of the forecast period.\n",
    "\n",
    "\n",
    "\n"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 27,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAPE of forecasts obtained by multiple linear regression on entire test period: 83.90865445283927\n"
     ]
    }
   ],
   "source": [
    "mape_lr_entire = MAPE(pred_lr[\"prediction\"], pred_lr[\"move\"]) * 100\n",
    "print(\"MAPE of forecasts obtained by multiple linear regression on entire test period: \" + str(mape_lr_entire))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 28,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAPE of forecasts obtained by multiple linear regression in the last two weeks: 72.11741385279376\n"
     ]
    }
   ],
   "source": [
    "pred_lr_sub = pred_lr.loc[pred_lr.week >= max(test_df.week) - NUM_TEST_PERIODS + GAP]\n",
    "mape_lr_sub = MAPE(pred_lr_sub[\"prediction\"], pred_lr_sub[\"move\"]) * 100\n",
    "print(\"MAPE of forecasts obtained by multiple linear regression in the last two weeks: \" + str(mape_lr_sub))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "### Combine forecasts from different methods\n",
    "\n",
    "We can combine the forecasts obtained by AutoML and multiple linear regression using weighted average and evaluate the final forecasts. Usually the combined forecasts will be more robust as a combination of two methods can reduce the chance of model overfitting. Here we use equal weights which can be further adjusted according to our confidence on each model."
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 29,
   "metadata": {},
   "outputs": [],
   "source": [
    "pred_final = pd.merge(\n",
    "    pred_automl[index_column_names + [\"predicted\", \"move\", \"week\"]],\n",
    "    pred_lr[index_column_names + [\"prediction\"]],\n",
    "    on=index_column_names,\n",
    "    how=\"left\",\n",
    ")\n",
    "pred_final[\"combined_prediction\"] = pred_final[\"predicted\"] * 0.5 + pred_final[\"prediction\"] * 0.5"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 30,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAPE of forecasts obtained by the combined model on entire test period: 87.2964359857758\n"
     ]
    }
   ],
   "source": [
    "mape_entire = MAPE(pred_final[\"combined_prediction\"], pred_final[\"move\"]) * 100\n",
    "print(\"MAPE of forecasts obtained by the combined model on entire test period: \" + str(mape_entire))"
   ]
  },
  {
   "cell_type": "code",
   "execution_count": 31,
   "metadata": {},
   "outputs": [
    {
     "name": "stdout",
     "output_type": "stream",
     "text": [
      "MAPE of forecasts obtained by the combined model in the last two weeks: 84.39534839261313\n"
     ]
    }
   ],
   "source": [
    "pred_final_sub = pred_final.loc[pred_final.week >= max(test_df.week) - NUM_TEST_PERIODS + GAP]\n",
    "mape_final_sub = MAPE(pred_final_sub[\"combined_prediction\"], pred_final_sub[\"move\"]) * 100\n",
    "print(\"MAPE of forecasts obtained by the combined model in the last two weeks: \" + str(mape_final_sub))"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "## Additional Reading\n",
    "\n",
    "\\[1\\] Nicolo Fusi, Rishit Sheth, and Melih Elibol. 2018. Probabilistic Matrix Factorization for Automated Machine Learning. In Advances in Neural Information Processing Systems. 3348-3357.<br>\n",
    "\\[2\\] Azure AutoML Package Docs: https://docs.microsoft.com/en-us/python/api/azureml-train-automl/azureml.train.automl?view=azure-ml-py <br>\n",
    "\\[3\\] Azure Automated Machine Learning Examples: https://github.com/Azure/MachineLearningNotebooks/tree/master/how-to-use-azureml/automated-machine-learning <br>\n",
    "\n",
    "\n"
   ]
  }
 ],
 "metadata": {
  "author_info": {
   "affiliation": "Microsoft",
   "created_by": "Chenhui Hu"
  },
  "kernelspec": {
   "display_name": "forecasting_env",
   "language": "python",
   "name": "forecasting_env"
  },
  "language_info": {
   "codemirror_mode": {
    "name": "ipython",
    "version": 3
   },
   "file_extension": ".py",
   "mimetype": "text/x-python",
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
   "version": "3.6.10"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 4
}
